Block-Level Audio Features for Music Genre Classification
نویسندگان
چکیده
While frame-level audio features, e.g. MFCCs, in combination with the bag-of-frames approach have widely and successfully been used, we use a block processing framework in our submission. In general block-level features have the advantage that they can capture more temporal information than BOF approaches can. We introduce two novel spectral patterns, closely related to the spectrum histogram and propose a modified version of the well-known fluctuation patterns. Based on these patterns we train a support vector machine to classify songs into different categories. 1. AUDIO PREPROCESSING We use the Java based audio signal analysis toolbox CoMIRVA (Collection of Music Information Retrieval and Visualization Applications) [1]. This library takes care of decode and resample any input audio file to 22 kHz raw PCM. A maximum of four minutes starting from the beginning of an audio file are decoded and the central two minutes of the decoded audio signal are analyzed per audio file. To analyze the audio signal it is transformed to the frequency domain by applying a Short Time Fourier Transform (STFT) using a window size of 2048 samples, a hop size of 512 samples and a Hanning window. Finally, we compute the magnitude spectrum thereof. 1.1 Cent-Scale We especially account for the musical nature of the audio signals by mapping the magnitude spectrum with linear frequency resolution onto a logarithmical musical scale, the Cent-scale [7]. We do so by simply summing all frequency bins of the magnitude spectrum with linear frequency resolution within a constant bandwidth of 100 cent starting from 2050 cent (equal to about 53.43 Hz). The resulting spectral feature vectors still have 97 dimensions. This results in a linear frequency resolution up to about 430 Hz and starts compressing the higher frequency content thereafter in a logarithmic way (see figure 1). We transform the compressed magnitude spectrum according to the above equation to obtain a logarithmic scale. Altogether, the mapping onto the Cent-scale is a fast approximation of a constant-Q transform, but with constant window length for all frequency bins. Figure 1 Spectrogram with linear frequency resolution (upper illustration) and the cent-scaled equivalent (lower illustration). 1.2 Audio Normalization Audio files are recorded at different volume levels. From a technical point of view this means that the whole audio signal is amplified by a constant factor The magnitude spectrum of the amplified signal is also scaled by the constant factor as the Fourier transform is a linear transformation. As we process all audio blocks based on a logarithmic amplitude scale (in dB), the amplified magnitude spectrum (in dB) is offset by a constant. For some features can be advantageous to be loudness invariant. Thus, we perform an audio normalization. In some audio applications this is achieved by a simple frame by frame mean removal. Removing the mean of each frame of course makes the spectral representation invariant to the constant offset. However, the local loudness information is lost, as all frames will have zero mean. The only information left is the spectral envelope of the audio frame. To keep some local loudness information but still make the whole audio signal loudness invariant the constant offset of a frame is estimated not just based on a single local frame, but using a fixed size neighborhood (in our experiments we use ±100 frames) around each frame. From each frame we remove the mean of its neighborhood.
منابع مشابه
شناسایی خودکار سبک موسیقی
Nowadays, automatic analysis of music signals has gained a considerable importance due to the growing amount of music data found on the Web. Music genre classification is one of the interesting research areas in music information retrieval systems. In this paper several techniques were implemented and evaluated for music genre classification including feature extraction, feature selection and m...
متن کاملUsing Block-level Features for Genre Classification, Tag Classification and Music Similarity Estimation
In our submission we use a set of block-level features for three different tasks, namely genre classification, tag classification and music similarity estimation. This abstract presents the feature set that is used and some specific details of the three submitted algorithms.
متن کاملAdditional Evidence That Common Low-level Features of Individual Audio Frames Are Not Representative of Music Genre
The Bag-of-Frames (BoF) approach has been widely used in music genre classification. In this approach, music genres are represented by statistical models of low-level features computed on short frames (e.g. in the tenth of ms) of audio signal. In the design of such models, a common procedure in BoF approaches is to represent each music genre by sets of instances (i.e. frame-based feature vector...
متن کاملMusic Genre Classification Using Text Categorization Method
Automatic music genre classification is one of the most challenging problems in music information retrieval and management of digital music database. In this paper, we propose a new method to classify music genres using text categorization methods. Differing from previous solutions which were mainly based on analysis on acoustic or symbolic audio signal, here we consider music as a text-like se...
متن کاملTowards Characterisation of Music via Rhythmic Patterns
A central problem in music information retrieval is finding suitable representations which enable efficient and accurate computation of musical similarity and identity. Low level audio features are ideal for calculating identity, but are of limited use for similarity measures, as many aspects of music can only be captured by considering high level features. We present a new method of characteri...
متن کاملGenre Classification Based on Tone Objects
This extended abstract details the submission to the 2013 Music Information Retrieval Evaluation eXchange in the Audio Classification Train/Test task. The proposed system is designed to perform improved genre classification by combining temporal segmentation and source separation of music signals into so-called tone objects. The audio features extracted from these tone objects are low-level and...
متن کامل